Go: add experimental query for IDNA digit-fold IP-literal smuggling by astrogilda · Pull Request #21784 · github/codeql

astrogilda · 2026-05-04T06:43:32Z

Summary

Adds a new experimental Go query go/idna-ip-literal-smuggle under go/ql/src/experimental/CWE-918/ that detects the UTS-46 IDNA digit-fold IP-literal smuggling anti-pattern.
Stateful taint configuration (TaintTracking::GlobalWithState) with two flow states (TPreIdna, TPostIdna) and a barrier that requires a trailing-dot trim followed by net.ParseIP / netip.ParseAddr to clear post-IDNA flow.
Inline-expectation tests with 23 unique sink alerts on the positive fixture and 1 intended-positive on the adversarial-witness binding case; 0 alerts on the negative fixture.

Motivation

golang.org/x/net/idna applies UTS-46 NFKC mapping inside (*Profile).ToASCII and (*Profile).ToUnicode for the Lookup, Display, Registration, and MapForLookup-derived profiles. The mapping folds 100 distinct non-ASCII Unicode digit codepoints partitioned into seven Unicode-block ranges (Latin-1 superscripts, mathematical superscripts, mathematical subscripts, circled digits, fullwidth digits, the Mathematical Alphanumeric Symbols block covering bold, double-struck, sans-serif, sans-serif-bold, and monospace digit styles, and segmented digits) to their ASCII equivalents. Devanagari digits (U+0966..U+096F) are not in scope: empirically verified against golang.org/x/net/idna v0.53.0, they pass through Punycode rather than fold to ASCII. The library contains no IP-literal detection, so a caller that runs net.ParseIP only before idna.ToASCII will accept a smuggled IPv4 literal such as "0.¹.0.0" (which maps to "0.1.0.0") into a downstream network sink that assumed the value was a domain name.

Single-state taint tracking is structurally insufficient: a pre-IDNA net.ParseIP barrier must NOT block flow that transitions through the IDNA call. The query therefore uses two flow states. Sources start in TPreIdna. The IDNA mapping call is modeled as a state-transition step that flips TPreIdna -> TPostIdna. Sinks are flagged only in TPostIdna. The barrier is net.ParseIP / net.ParseCIDR / netip.ParseAddr / netip.ParsePrefix, but only when the value reaching the parser was produced by a trailing-dot trim (strings.TrimRight(_, "."), strings.TrimSuffix(_, "."), or the manual if strings.HasSuffix(out, ".") { out = out[:len(out)-1] } slice form) of the IDNA output. The trim is required because "0.¹.0.0." maps to "0.1.0.0.", which net.ParseIP rejects on its own yet is still an IP literal for routing purposes.

The detection-strategy design choice (CodeQL inter-procedural taint as the primary recall vehicle, with Semgrep OSS as a high-precision direct-call secondary sweep) is documented in the upstream rules repo at docs/research/v0.1-detection-strategy.md. The short version: CodeQL TaintTracking::GlobalWithState is inter-procedural by default and propagates through one-deep wrappers like idnaASCII without an explicit isAdditionalFlowStep. That matters because the only two production callsites of any UTS-46-mapping profile across golang/go, kubernetes/kubernetes, and prometheus/prometheus (a 660 MB sweep) are wrapped by such a helper. A blanket structural rule of the form "ToASCII without recheck in same block" was considered and rejected: an empirical sweep across 31 production callsites in 19 repositories shows the rule would fire on PSL walkers, registrar pipelines, and TLS-SNI normalization sites where the mapped value never reaches a network-routing decision driven by attacker input, and the false-positive density is too high to ship as a default.

Scope is IPv4 only. IPv6 colons are rejected by IDNA rune-validation before UTS-46 mapping runs, so no IPv6 smuggle path exists.

Verification

Compile and unit-test commands (run from a working tree of github/codeql, with the staged files in place):

codeql query compile go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql
codeql test run     go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/

Local results from CodeQL CLI 2.25.3:

query compile: clean (1/1 success, 17 s).
test run: positive fixture produces 23 unique sink alerts plus 1 intended-positive on the adversarial-witness binding case; negative fixture produces 0 alerts. Inline-expectations match.

The positive fixture exercises the canonical wrapper shape modeled on golang.org/x/net/http/httpproxy.canonicalAddr (a one-deep idnaASCII helper consumed by both net.JoinHostPort and net.Dial-class sinks). The query fires twice on this shape, once per sink. That is the empirical evidence that TaintTracking::GlobalWithState propagates through the wrapper with no additional-flow-step model required.

The negative fixture covers the safe pattern (post-IDNA strings.TrimRight / strings.TrimSuffix / manual HasSuffix slice followed by net.ParseIP or netip.ParseAddr), the Punycode profile (which has a nil mapping function and is correctly excluded from the idnaMappingCall predicate), and unrelated hostname use that never transits IDNA.

Disposition by the upstream Go security team

The Go security team reviewed the underlying digit-fold behavior and declined to treat it as a library bug on 2026-04-29: their position is that UTS-46 is correctly implemented and the recheck obligation belongs to the caller. This query implements the caller-side detection that follows from that disposition. It is not a substitute for a library fix; it is the alert that surfaces the contract on the call site where the recheck is missing.

Source artefacts

The query, library, qhelp, examples, and test fixtures originate from the upstream rules repository astrogilda/idna-ip-literal-smuggle-rules (codeql/ subdirectory). The upstream repo also carries the corpus-eval scripts, the v0.1.x detection-strategy synthesis under docs/research/, and complementary Semgrep and gopatch rules for the same anti-pattern.

CLA

I will sign the CLA on first PR comment per the registry workflow.

References

Unicode Technical Standard JavaScript: Add library for HTML sanitizers #46 (IDNA Compatibility Processing): https://www.unicode.org/reports/tr46/
golang.org/x/net/idna package documentation: https://pkg.go.dev/golang.org/x/net/idna
WHATWG URL Standard, ends-in-a-number host parser check: https://url.spec.whatwg.org/#ends-in-a-number-checker
CWE-918 (Server-Side Request Forgery): https://cwe.mitre.org/data/definitions/918.html
CWE-020 (Improper Input Validation): https://cwe.mitre.org/data/definitions/20.html

Cross-language analogs of the same caller-side IP-literal-validation gap that this query addresses for Go IDNA:

CVE-2021-29923: Go net.ParseIP accepted IPv4 octets with leading zeros, allowing SSRF allowlist bypass via octal interpretation downstream.
CVE-2024-12224: Caller-side IP-literal validation gap producing comparable allowlist bypass behavior.
CVE-2024-3651: idna (Python) UTS-46 / Punycode handling input that drove disproportionate parser cost; same library family, different failure mode but illustrates the recurring pattern of UTS-46 input requiring caller-side validation that the library does not provide.

Adds go/idna-ip-literal-smuggle under go/ql/src/experimental/CWE-918/. The query detects callers that pass an untrusted hostname through golang.org/x/net/idna (UTS-46 NFKC mapping folds 100 non-ASCII Unicode digit codepoints to ASCII) into a network sink without rechecking the post-IDNA value as an IP literal after a trailing-dot trim. Without the recheck, an input such as "0.[U+00B9].0.0" maps to "0.1.0.0" and slips past any pre-IDNA net.ParseIP guard, smuggling an IPv4 literal into net.JoinHostPort, net.Dial, http.Request.URL.Host, tls.Config.ServerName, http.Cookie.Domain, and the DNS resolver entry points. Implementation uses TaintTracking::GlobalWithState with two flow states (TPreIdna, TPostIdna). The IDNA mapping call is a state transition step. The barrier is a trailing-dot trim followed by net.ParseIP / net.ParseCIDR / netip.ParseAddr / netip.ParsePrefix in TPostIdna; a bare ParseIP without the prior trim does not sanitize because "0.1.0.0." is rejected by ParseIP yet remains a valid IP literal for routing. IPv6 is out of scope because the colon is a UTS-46 disallowed rune. Tests: 23 unique sink alerts on the positive fixture (which includes the canonical canonicalAddr-shape wrapper), no false positives on the negative fixture (safe pattern, Punycode profile, non-IDNA hostname use). codeql test run reports 1 of 1 PASSED.

Copilot

Pull request overview

Adds a new experimental Go security query for detecting IDNA UTS-46 digit-fold cases where untrusted hostnames become IPv4 literals after IDNA normalization and then flow into network-relevant sinks without a post-IDNA IP recheck.

Changes:

Adds a new stateful taint-tracking library/query under go/ql/src/experimental/CWE-918/ plus qhelp and example code.
Adds positive and negative inline-expectation fixtures covering sinks, barriers, and a witness-binding regression case.
Adds a dedicated Go test module and expected output for the new experimental query.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/positives.go`	Positive inline-expectation cases for vulnerable IDNA-to-sink flows.
`go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/negatives.go`	Negative and regression fixtures for safe post-IDNA rechecks and non-sink cases.
`go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/go.mod`	Standalone module/dependency setup for the new Go test fixture.
`go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.qlref`	Connects the test directory to the experimental query and inline postprocessors.
`go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.expected`	Generated expected output baseline for the inline-expectation test run.
`go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggleGood.go`	Safe example used by the query help documentation.
`go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggleBad.go`	Vulnerable example used by the query help documentation.
`go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qll`	Core stateful taint-tracking model for IDNA transitions, barriers, and sinks.
`go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql`	Query entry point and result metadata for the new path-problem query.
`go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qhelp`	End-user documentation, remediation guidance, and examples for the query.

+// Compliant: post-IDNA TrimSuffix + netip.ParseAddr recheck before (*Resolver).LookupIPAddr.
+func compliantResolverLookupIPAddr(req *http.Request) {
+	host := req.Header.Get("X-HOST-IPADDR-OK")
+	ace, err := idna.Lookup.ToASCII(host)
+	if err != nil {
+		return
+	}
+	candidate := strings.TrimSuffix(ace, ".")
+	if _, parseErr := netip.ParseAddr(candidate); parseErr == nil {
+		return
+	}
+	r := &net.Resolver{}
+	r.LookupIPAddr(context.Background(), ace) // OK: post-IDNA recheck barrier


+// --- Class 8: Devanagari digits (U+0966..U+096F) ---
+// UTS-46 NFKC folds Devanagari digits to ASCII equivalents.
+// "१.0.0.0" (U+0967 DEVANAGARI ONE) -> "1.0.0.0"
+func smuggleDevanagariDigit(req *http.Request) {
+	host := req.Header.Get("X-HOST-DEVANAGARI") // $ Source
+	ace, _ := idna.Lookup.ToASCII(host)
+	http.Get("https://" + ace + "/") // $ Alert
+}
+
+// --- Class 8 second positive: Devanagari digit U+0969 -> "3",
+// (*net.Resolver).LookupIPAddr sink ---
+// "१.0.0.३" -> "1.0.0.3"
+func smuggleDevanagariResolverLookupIPAddr(req *http.Request) {
+	host := req.Header.Get("X-HOST-DEVANAGARI-TWO") // $ Source
+	ace, _ := idna.Lookup.ToASCII(host)
+	r := &net.Resolver{}
+	r.LookupIPAddr(context.Background(), ace) // $ Alert
+}
+


+ * `(*idna.Profile).ToASCII` (and the package-level `idna.ToASCII`,
+ * `Lookup.ToASCII`, `MapForLookup().ToASCII`) is modeled as a


+	ace, _ := idna.Punycode.ToASCII(host)
+	http.Get("https://" + ace + "/") // OK: no digit-fold profile
+}
+


+Vulnerable pattern. <code>net.ParseIP</code> is called only before
+<code>idna.ToASCII</code>, so the smuggled literal slips through:


+that folds <strong>100 distinct non-ASCII Unicode digit codepoints</strong>
+across 8 families to their ASCII equivalents. The 8 families are:
+</p>
+<ul>
+  <li>Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints</li>
+  <li>Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints</li>
+  <li>Mathematical subscripts (U+2080..U+2089): 10 codepoints</li>
+  <li>Circled digits (U+2460..U+2468, U+24EA): 10 codepoints</li>
+  <li>Fullwidth digits (U+FF10..U+FF19): 10 codepoints</li>
+  <li>Mathematical bold, sans-serif, double-struck, and monospace digits
+      (U+1D7CE..U+1D7FF): 50 codepoints</li>
+  <li>Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints</li>


+single trailing dot and call <code>net.ParseIP</code> (or
+<code>netip.ParseAddr</code>) on the result, then reject on non-nil. The
+trailing-dot trim is required because <code>"0.¹.0.0."</code> maps to
+<code>"0.1.0.0."</code>, which <code>net.ParseIP</code> rejects on its
+own yet is still an IP literal for routing purposes.


+The safe pattern accepts three equivalent trailing-dot trim forms:
+</p>
+<ul>
+  <li><code>strings.TrimRight(ace, ".")</code>: multi-dot form. Handles
+      the fullwidth and ideographic dot variants that produce multiple
+      trailing ASCII dots after UTS-46 mapping.</li>
+  <li><code>strings.TrimSuffix(ace, ".")</code>: single-dot form.
+      Sufficient for most inputs but incomplete for the multi-dot
+      variant.</li>
+  <li><code>if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }</code>:
+      manual slice form. Equivalent to <code>TrimSuffix</code> in
+      effect.</li>


+// True-negative: caller uses idna.Display for human rendering only; the
+// output never reaches a network sink in this function.
+func displayOnly(req *http.Request) {
+	host := req.Header.Get("X-HOST-DISPLAY")
+	disp, _ := idna.Display.ToUnicode(host)
+	_ = disp // OK: never reaches a sink


+ * @description An untrusted hostname flows through `golang.org/x/net/idna`
+ *              mapping (which folds 100 non-ASCII Unicode digit codepoints to
+ *              ASCII via UTS-46 NFKC) and reaches a security-relevant
+ *              hostname sink without a post-IDNA IP-literal recheck. A
+ *              caller that calls `net.ParseIP` only BEFORE `idna.ToASCII`
+ *              will accept a smuggled IPv4 literal such as `"0.¹.0.0"`
+ *              (which maps to `"0.1.0.0"`). Scope is IPv4 only because
+ *              IPv6 colons are rejected by IDNA rune-validation before


@description

Fix several drift issues between the prose and the predicates: - qhelp now lists the seven Unicode-block ranges that account for the 100 fold codepoints, replacing the prior "8 families" claim that did not match the bullet list. Adds an explicit note that Devanagari digits do not fold (verified empirically against golang.org/x/net/idna v0.53.0) and that Registration disallows every fold codepoint at rune validation. - qhelp recommendation corrects netip.ParseAddr semantics (parsed via err == nil, not via a non-nil return) and rewrites the trim-form list to state that TrimSuffix and the manual slice are not equivalent to TrimRight for multi-trailing-dot inputs. - qhelp example caption matches the actual sample, which shows the no-recheck shape, not a pre-IDNA ParseIP shape. - qll module docstring now covers both ToASCII and ToUnicode on the digit-folding profiles, lists the actual idna.New(MapForLookup) construction shape, and explicitly states that the package-level idna.ToASCII helper and the Punycode profile are excluded. - ql @description and select message reflect that the model covers both ToASCII and ToUnicode and that the no-recheck case is the primary anti-pattern, not just the pre-IDNA-ParseIP case.

Three fixture changes plus a baseline refresh: - Drop the two Devanagari positives. Empirical testing against golang.org/x/net/idna v0.53.0 confirms that U+0966..U+096F do not fold to ASCII via UTS-46; they pass through Punycode (xn--*) on all four profiles. Keeping the cases in would be misleading because the query fires structurally without the runtime smuggle ever existing. - Add two positives covering Profile.ToUnicode on Latin-1 and Math superscript inputs. The library runs validateAndMap before the encode-vs-decode branch, so ToUnicode produces the same digit-folded ASCII output as ToASCII for the in-scope codepoints. The earlier fixture only exercised ToASCII despite the model handling both. - Add three negatives. Two pin the ParseCIDR and ParsePrefix branches of the recheck-input predicate, which had no sink-reaching coverage before. The third pins the documented exclusion of the package-level idna.ToASCII helper against future broadening of the call matcher. Baseline refreshed via codeql test run --learn after the fixture changes shifted line numbers and the new select message text replaced the old one.

astrogilda · 2026-05-04T14:13:37Z

Pushed two follow-ups for the review pass.

docs: align qhelp/qll/ql with implemented model (6392c52)

qhelp recommendation: corrected netip.ParseAddr semantics (parsed via err == nil, not via a non-nil return).
qhelp example caption: matches the actual IdnaIpLiteralSmuggleBad.go sample, which shows the no-recheck shape, not a pre-IDNA ParseIP shape.
qhelp trim-form list: states explicitly that TrimSuffix and the manual slice form are not equivalent to TrimRight for multi-trailing-dot inputs.
qhelp family count: replaced "8 families" with the seven Unicode-block ranges that account for the 100 fold codepoints. Added a note that Devanagari digits are out of scope, verified empirically against golang.org/x/net/idna v0.53.0.
qll header: now covers both ToASCII and ToUnicode on the digit-folding profiles, lists the actual idna.New(idna.MapForLookup(), ...) construction shape, and explicitly states that the package-level idna.ToASCII helper is excluded.
ql @description and select message: cover both ToASCII and ToUnicode, frame the no-recheck case as the primary anti-pattern.

test: tighten fixture coverage and refresh baseline (771b433)

Dropped the two Devanagari positives. Empirical probe against golang.org/x/net/idna v0.53.0: U+0966..U+096F do not fold under any of the four UTS-46 profiles; they pass through Punycode (xn--*). Keeping the cases would have been misleading because the query fired structurally without the runtime smuggle ever existing.
Added two ToUnicode positives (Latin-1 superscript on Lookup, Math superscript on Display). The library runs validateAndMap before the encode-vs-decode branch, so ToUnicode produces the same digit-folded ASCII output as ToASCII for in-scope codepoints.
Added safe-pattern negatives for net.ParseCIDR and netip.ParsePrefix. Both were modeled in the recheck-input predicate but had no sink-reaching coverage before.
Added a negative for the package-level idna.ToASCII helper to pin the documented exclusion against future broadening of the call matcher.

Baseline refreshed via codeql test run --learn after the fixture changes shifted line numbers and the new select message text replaced the old one.

One precision gap I noticed during the round but did not pull into this commit, for transparency: the idnaMappingCall Punycode exclusion is syntactic and would miss aliased forms like pp := idna.Punycode; pp.ToASCII(...). A localFlow-from-idna.Punycode form would catch them. Marginal in practice. I can fold it into a third commit on this branch if you prefer.

github-actions · 2026-05-05T12:33:30Z

QHelp previews:

go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qhelp

IDNA digit-fold IP-literal smuggling via UTS-46 NFKC mapping

The Go module golang.org/x/net/idna implements UTS-46 IDNA processing. On the Lookup and Display profiles (and any profile constructed via idna.New(idna.MapForLookup(), ...)), both (*Profile).ToASCII and (*Profile).ToUnicode apply an NFKC-based character map that folds 100 distinct non-ASCII Unicode digit codepoints to their ASCII equivalents. The 100 codepoints partition into the following Unicode-block ranges:

Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints
Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints
Mathematical subscripts (U+2080..U+2089): 10 codepoints
Circled digits (U+2460..U+2468, U+24EA): 10 codepoints
Fullwidth digits (U+FF10..U+FF19): 10 codepoints
Mathematical Alphanumeric Symbols digits, spanning bold, double-struck, sans-serif, sans-serif-bold, and monospace styles (U+1D7CE..U+1D7FF): 50 codepoints
Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints
Devanagari digits (U+0966..U+096F) are not in scope: empirical testing against golang.org/x/net/idna v0.53.0 confirms they do not fold to ASCII via UTS-46. The Registration profile is structurally covered by the rule but disallows every fold codepoint at the rune-validation stage, so a caller that respects the returned error never sees a smuggled literal from that profile in practice.

The library contains no IP-literal detection. A caller that applies UTS-46 mapping to an attacker-controlled host string and consumes the result in a network sink without rechecking against IP-literal parsers receives a valid ASCII IPv4 literal back as the "domain name" output. Any downstream allowlist check, SSRF guard, NoProxy match, or TLS-SNI router that does not re-check the post-IDNA result is bypassed. The anti-pattern also applies to callers that do a pre-IDNA net.ParseIP check and think it is sufficient: the smuggled host is not ASCII, so the pre-IDNA check rejects it as non-IP, and the post-IDNA value (now a numeric literal) reaches the sink unguarded.

IPv6 is out of scope: : is a UTS-46 disallowed character; bare-IPv6 inputs are rejected by IDNA rune-validation before any digit-fold mapping runs.

Sinks where the smuggled literal becomes exploitable include net.JoinHostPort, net.Dial, (*http.Request).URL.Host, (*tls.Config).ServerName, (*http.Cookie).Domain, and any HTTP client request URL constructed from the mapped value.

Recommendation

Either:

Use a strict IDNA profile option that returns an error if the mapped output parses as an IP literal, if your IDNA library exposes one.
Apply the explicit safe pattern: after the IDNA mapping call, strip trailing dots from the result and parse it. Reject if net.ParseIP returns a non-nil address, or if netip.ParseAddr returns no error (note the inverted convention: netip.ParseAddr reports a successfully parsed address via err == nil, not via a non-zero return). The trailing-dot strip is required because "0.¹.0.0." maps to "0.1.0.0.", which a bare net.ParseIP rejects on its own yet is still an IP literal for routing purposes; the strip exposes the literal so the parser sees it.

Example

Vulnerable pattern. The host string is mapped through the IDNA profile and reaches a network sink with no post-IDNA IP-literal recheck:

package main

import (
	"net"
	"net/http"
	"net/url"

	"golang.org/x/net/idna"
)

// VulnerableLookup mirrors the shape of the anti-pattern as it appears in
// real Go code: an attacker-controlled host string is canonicalised through
// idna.Lookup.ToASCII, the result is consumed by a network sink, and there
// is no post-IDNA recheck against IP-literal parsers. UTS-46 NFKC mapping
// folds 100 non-ASCII digit codepoints (e.g. fullwidth, mathematical
// superscripts, circled, segmented) to their ASCII equivalents, so an input
// like "0.¹.0.0" emerges from ToASCII as "0.1.0.0" and reaches the sink
// as a routable IPv4 literal.
func VulnerableLookup(host string) (*http.Response, error) {
	ace, err := idna.Lookup.ToASCII(host)
	if err != nil {
		return nil, err
	}
	return http.Get("https://" + ace + "/")
}

// VulnerableProxyRoute mirrors the canonicalAddr shape used in callers that
// canonicalise a URL host before applying network policy. The host is read
// from an attacker-controlled URL, mapped through an idna.Lookup.ToASCII
// wrapper, and passed to net.JoinHostPort without an IP-literal recheck.
func VulnerableProxyRoute(rawURL string) (string, error) {
	u, err := url.Parse(rawURL)
	if err != nil {
		return "", err
	}
	addr := u.Hostname()
	if v, err := idna.Lookup.ToASCII(addr); err == nil {
		addr = v
	}
	return net.JoinHostPort(addr, "443"), nil
}

Safe pattern. Post-IDNA trailing-dot strip followed by net.ParseIP recheck:

package main

import (
	"net"
	"net/http"
	"strings"

	"golang.org/x/net/idna"
)

// SafeLookup applies the safe pattern: a post-IDNA trailing-dot trim
// followed by net.ParseIP. The trim is required because "0.¹.0.0." maps
// to "0.1.0.0." which net.ParseIP rejects on its own yet is still
// routable as 0.1.0.0 in the rest of the stack.
func SafeLookup(host string) (*http.Response, error) {
	ace, err := idna.Lookup.ToASCII(host)
	if err != nil {
		return nil, err
	}

	// Post-IDNA trailing-dot trim, then re-check. TrimRight (not
	// TrimSuffix) handles multiple trailing dots that UTS-46 mapping can
	// produce when fullwidth/ideographic dots compose with ASCII dots.
	candidate := strings.TrimRight(ace, ".")
	if ip := net.ParseIP(candidate); ip != nil {
		return nil, errBadHost
	}

	return http.Get("https://" + ace + "/")
}

var errBadHost = errIPLiteral{}

type errIPLiteral struct{}

func (errIPLiteral) Error() string { return "ip literals not allowed" }

The safe pattern accepts three trailing-dot strip forms. They are not equivalent in coverage:

strings.TrimRight(ace, "."): strict form. Strips all trailing dots, so the multi-dot residue produced when UTS-46 maps the fullwidth dot U+FF0E or the ideographic dot U+3002 next to ASCII dots is fully removed.
strings.TrimSuffix(ace, "."): lenient form. Strips only one trailing dot. Sufficient for the canonical "0.1.0.0." shape but leaves residue if multiple trailing dots were produced by mapping.
if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }: manual single-dot slice. Behaves identically to TrimSuffix in coverage and inherits the same multi-dot-residue limitation.
Callers whose threat model includes the multi-trailing-dot variant should prefer strings.TrimRight. After the strip, parse with netip.ParseAddr (preferred) or net.ParseIP and reject if the value parses as an IP literal (err == nil for the former, non-nil return for the latter).

References

Unicode Technical Standard JavaScript: Add library for HTML sanitizers #46 (IDNA Compatibility Processing): https://www.unicode.org/reports/tr46/
golang.org/x/net/idna package documentation: https://pkg.go.dev/golang.org/x/net/idna
WHATWG URL Standard, ends_in_a_number host parser check (prior art for IP-literal detection in URL parsers): https://url.spec.whatwg.org/#ends-in-a-number-checker
CWE-918: Server-Side Request Forgery (SSRF): https://cwe.mitre.org/data/definitions/918.html
CWE-020: Improper Input Validation: https://cwe.mitre.org/data/definitions/20.html
Common Weakness Enumeration: CWE-918.
Common Weakness Enumeration: CWE-20.

+    /**
+     * The IDNA mapping is modeled as a state-transition step:
+     *   `TPreIdna(arg) -> TPostIdna(result)`
+     */


+    /**
+     * A correct post-IDNA IP-literal recheck (trailing-dot trim FOLLOWED
+     * BY `net.ParseIP` or equivalent) is a barrier in `TPostIdna`. The
+     * trim source is bound to the post-IDNA-tainted predecessor so that
+     * an unrelated TrimRight + ParseIP construct elsewhere in the same
+     * scope does not silently sanitize the IDNA-tainted path. A bare
+     * `net.ParseIP` without the prior trim is NOT a barrier; the alert
+     * remains.
+     */


+     */
+    predicate isBarrier(DataFlow::Node node, FlowState state) {
+      state = TPostIdna() and
+      exists(DataFlow::Node postIdnaResult, DataFlow::Node parseInput |


Collapses a single double-space before an inline comment that gofmt normalizes to one space. No semantic change. Fixes the make check-formatting CI step that flagged the file.

Co-authored-by: Owen Mansel-Chan <62447351+owen-mc@users.noreply.github.com>

owen-mc · 2026-05-21T10:08:44Z

While running tests we get the following errors:


Extractor command failed in /home/runner/work/semmle-code/semmle-code/ql/go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle
A 'codeql test extract' operation failed with error code 2
Command was: /home/runner/work/semmle-code/semmle-code/target/intree/codeql-go/go/tools/index.sh
[2/2] [89/496] FAILED(EXTRACTION) /home/runner/work/semmle-code/semmle-code/ql/go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.qlref
Output of failing command:
A 'codeql test extract' operation failed with error code 2
[2/2] [90/496] FAILED(EXTRACTION) /home/runner/work/semmle-code/semmle-code/ql/go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/DB-CHECK
2026/05/20 13:11:05 Build flags: '-mod=vendor'; patterns: './...'
A 'codeql test extract' operation failed with error code 2
2026/05/20 13:11:05 Running packages.Load.
[2/2] [91/496 comp 14.2s] FAILED(EXTRACTION) /home/runner/work/semmle-code/semmle-code/ql/go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/CONSISTENCY/UnexpectedFrontendErrors.ql
A 'codeql test extract' operation failed with error code 2
[2/2] [92/496] FAILED(EXTRACTION) /home/runner/work/semmle-code/semmle-code/ql/go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/CONSISTENCY/DataFlowConsistency.ql
2026/05/20 13:11:05 Error running go tooling: err: exit status 1: stderr: go: inconsistent vendoring in /home/runner/work/semmle-code/semmle-code/ql/go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle:
	golang.org/x/net@v0.53.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt
	golang.org/x/text@v0.36.0: is explicitly required in go.mod, but not marked as explicit in vendor/modules.txt

	To ignore the vendor directory, use -mod=readonly or -mod=mod.
	To sync the vendor directory, run:
		go mod vendor

(end of extractor output) Exit code: 1
Could not extract a dataset in /home/runner/work/semmle-code/semmle-code/ql/go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle: A 'codeql test extract' operation failed with error code 2

The fixture required golang.org/x/net and golang.org/x/text in go.mod but had no vendor directory, so the Go extractor's -mod=vendor build failed with an inconsistent-vendoring error. Add the vendored modules and go.sum so test extraction succeeds.

astrogilda · 2026-06-17T07:50:32Z

@owen-mc thanks for running the tests and catching this. The failure was an inconsistent-vendoring error: the fixture declared golang.org/x/net and golang.org/x/text in go.mod but shipped no vendor/ tree, so the extractor's -mod=vendor build couldn't resolve them. I pushed a463f43, which vendors both modules and adds go.sum, the same way the sibling CWE-918 fixtures handle their deps. go list -mod=vendor ./... now resolves cleanly against the fixture. Could you re-run the suite when you get a chance?

astrogilda · 2026-06-17T07:51:31Z

cc @henrymercer @hvitved for visibility. The vendoring fix for the test failure owen-mc flagged is now in (a463f43), so this should be ready for a re-run when one of you has a cycle. Glad to adjust anything on the query itself.

astrogilda requested a review from a team as a code owner May 4, 2026 06:43

Copilot AI review requested due to automatic review settings May 4, 2026 06:43

github-actions Bot added documentation Go labels May 4, 2026

Copilot started reviewing on behalf of astrogilda May 4, 2026 06:44 View session

Copilot AI reviewed May 4, 2026

View reviewed changes

astrogilda added 2 commits May 4, 2026 11:08

github-advanced-security AI found potential problems May 5, 2026

View reviewed changes

style: gofmt negatives.go fixture

611b7cc

Collapses a single double-space before an inline comment that gofmt normalizes to one space. No semantic change. Fixes the make check-formatting CI step that flagged the file.